ארכי טק טורת יחיד ת עיבוד מרכזי ת

Size: px

Start display at page:

Download "ארכי טק טורת יחיד ת עיבוד מרכזי ת"

Piers Wheeler
5 years ago
Views:

1 ארכי טק טורת יחיד ת עיבוד מרכזי ת ( ) תשס"ג סמסטר א' March, 2007 Hugo Guterman Web site: Arch. CPU L5 Pipeline II 1 Outline More pipelining Control Hazards Registers and Memory Branch Prediction Exceptions and interrupts Examples Arch. CPU L5 Pipeline II 2

2 Taxonomy of Hazards Arch. CPU L5 Pipeline II 3 Control Hazards - Branches Arch. CPU L5 Pipeline II 4

3 Basic Pipeline Arch. CPU L5 Pipeline II 5 Branch Hazards Arch. CPU L5 Pipeline II 6

4 Branch Hazards Arch. CPU L5 Pipeline II 7 Solution assume branch not taken!! Arch. CPU L5 Pipeline II 8

5 What to do if branch taken? Arch. CPU L5 Pipeline II 9 What happens when branch is taken? Arch. CPU L5 Pipeline II 10

6 Side effects Arch. CPU L5 Pipeline II 11 Side effects Arch. CPU L5 Pipeline II 12

7 Move the branch computation forward Arch. CPU L5 Pipeline II 13 Move the branch computation further forward Arch. CPU L5 Pipeline II 14

8 Result: new improved MIPS Datapath Arch. CPU L5 Pipeline II 15 Pipeline Idiosyncrasies Arch. CPU L5 Pipeline II 16

9 Rewrite the code for delay slot Arch. CPU L5 Pipeline II 17 Problems with delay slot Arch. CPU L5 Pipeline II 18

10 Datapath with branch logic Arch. CPU L5 Pipeline II 19 Problems with delay slot Arch. CPU L5 Pipeline II 20

11 Branch prediction is better? Arch. CPU L5 Pipeline II 21 Prediction of non-taken Arch. CPU L5 Pipeline II 22

12 Branch miss-prediction Arch. CPU L5 Pipeline II 23 How to improve?? Arch. CPU L5 Pipeline II 24

13 Arch. CPU L5 Pipeline II 25 Dynamic Branch Prediction Arch. CPU L5 Pipeline II 26

14 1-bit Branch Prediction Arch. CPU L5 Pipeline II 27 1-bit Branch Prediction Arch. CPU L5 Pipeline II 28

15 Dynamic Branch Prediction Solution: 2-bit scheme where change prediction only if get misprediction twice Predict Taken Predict Not Taken T T NT NT T T Predict Taken NT Predict Not Taken NT Arch. CPU L5 Pipeline II 29 Need Same Time as Prediction Branch Target Buffer (BTB): Address of branch index to get prediction AND branch address (if taken) Note: must check for branch match now, since can t use wrong branch address Predicted PC Branch Prediction: Taken or not Taken Return instruction addresses predicted with stack Arch. CPU L5 Pipeline II 30

16 What makes pipelines hard to implement Arch. CPU L5 Pipeline II 31 Exception & Interrupts Arch. CPU L5 Pipeline II 32

17 Exception Flow Arch. CPU L5 Pipeline II 33 Flow of instructions during exception Arch. CPU L5 Pipeline II 34

18 Characterization of exceptions and interrupts Arch. CPU L5 Pipeline II 35 Type of exceptions Arch. CPU L5 Pipeline II 36

19 Stooping and Restarting Execution Arch. CPU L5 Pipeline II 37 Precise vs. Imprecise Exceptions Arch. CPU L5 Pipeline II 38

20 Precise vs. Imprecise Exceptions Arch. CPU L5 Pipeline II 39 Exceptions and CPU Architecture Arch. CPU L5 Pipeline II 40

21 Multiple Exceptions Arch. CPU L5 Pipeline II 41 Multiple Exceptions Arch. CPU L5 Pipeline II 42

22 Exceptions Arch. CPU L5 Pipeline II 43 Performance of Pipelined Systems Arch. CPU L5 Pipeline II 44

23 Data dependencies Arch. CPU L5 Pipeline II 45 Data dependencies Arch. CPU L5 Pipeline II 46

24 Branch delay slot Arch. CPU L5 Pipeline II 47 Branch delay slot Arch. CPU L5 Pipeline II 48

25 Bypass Paths Arch. CPU L5 Pipeline II 49 Bypass Paths Arch. CPU L5 Pipeline II 50

26 Loop unrolling Arch. CPU L5 Pipeline II 51 Loop unrolling Arch. CPU L5 Pipeline II 52

27 Loop unrolling Arch. CPU L5 Pipeline II 53 Code Performance Arch. CPU L5 Pipeline II 54

28 Code Performance Arch. CPU L5 Pipeline II 55 Machine Performance Arch. CPU L5 Pipeline II 56

29 Machine Performance Arch. CPU L5 Pipeline II 57 Machine Performance (2) Arch. CPU L5 Pipeline II 58

30 Machine Performance (2) Arch. CPU L5 Pipeline II 59 Pipeline Hazards Again I-Fet ch ID MemOpFetch OpFetch Exec Store Structural Hazard IFetch ID I-Fet ch ID OpFetch Jump Control Hazard IFetch I D IF ID EX Mem WB IF ID EX Mem WB IF ID EX Mem WB RAW (read after write) Data Hazard WAW Data Hazard (write after write) IF ID OF Ex Mem IF ID OF Ex RS WAR Data Hazard (write after read) Arch. CPU L5 Pipeline II 60

31 Data Hazards Avoid some by design eliminate WAR by always fetching operands early (DCD) in pipe eleminate WAW by doing all WBs in order (last stage, static) Detect and resolve remaining ones stall or forward (if possible) IF ID EX Mem WB RAW Data Hazard IF ID EX Mem WB IF ID EX Mem WB WAW Data Hazard IF ID OF Ex Mem IF ID OF Ex RS RAW Data Hazard Arch. CPU L5 Pipeline II 61 Hazard Detection Suppose instruction i is about to be issued and a predecessor instruction j is in the instruction pipeline. A RAW hazard exists on register ρ if ρ Rregs( i ) Wregs( j ) Keep a record of pending writes (for inst's in the pipe) and compare with operand regs of current instruction. When instruction issues, reserve its result register. When on operation completes, remove its write reservation. A WAW hazard exists on register ρ if ρ Wregs( i ) Wregs( j ) A WAR hazard exists on register ρ if ρ Wregs( i ) Rregs( j ) Arch. CPU L5 Pipeline II 62

32 Issues in Pipelined design Pipelining Super-pipeline - Issue one instruction per (fast) cycle - ALU takes multiple cycles IF D Ex M W IF D Ex M W IF D Ex M W IF D Ex M W IF D Ex M W IF D Ex M W IF D Ex M W IF D Ex M W Limitation Issue rate, FU stalls, FU depth Clock skew, FU stalls, FU depth Super-scalar - Issue multiple scalar instructions per cycle IF D Ex M W IF D Ex M W IF D Ex M W IF D Ex M W Hazard resolution VLIW ( EPIC ) - Each instruction specifies multiple scalar operations - Compiler determines parallelism IF D Ex M W Ex M W Ex M W Ex M W Packing Vector operations - Each instruction specifies series of identical operations IF D Ex M W Ex M W Ex M W Ex M W Applicability Arch. CPU L5 Pipeline II 63 Partitioned Instruction Issue (simple Superscalar) independent int and FP issue to separate pipelines I-Cache Int Reg Inst Issue and Bypass FP Reg Operand / Result Busses Int Unit Load / Store Unit FP Add FP Mul D-Cache Single Issue Total Time = Int Time + FP Time Max Speedup: Total Time MAX(Int Time, FP Time) Arch. CPU L5 Pipeline II 64

33 The Big Picture: Where are We Now? The Five Classic Components of a Computer Processor Input Control Memory Datapath Output Arch. CPU L5 Pipeline II 65 FYI: MIPS R3000 clocking discipline phi1 phi2 2-phase non-overlapping clocks Pipeline stage is two (level sensitive) latches Edge-triggered phi1 phi2 phi1 Arch. CPU L5 Pipeline II 66

( ) תשס"ח סמסטר ב' May, 2008 Hugo Guterman Web site:

( ) תשסח סמסטר ב' May, 2008 Hugo Guterman Web site: ארכיטקטורת יחידת עיבוד מרכזית (36113741) תשס"ח סמסטר ב' May, 2008 Hugo Guterman (hugo@ee.bgu.ac.il) Web site: http://www.ee.bgu.ac.il/~cpuarch Arch. CPU L5 Pipeline II 1 Outline More pipelining Control